Project: Communicate Data Findings - [Ford GoBike System Data ]

Table of Contents

Introduction

Dataset Description

This data set includes information about individual rides made in a bike-sharing system covering the greater San Francisco Bay area.

Data Wrangling and Cleaning

  1. Reading CSV
  2. Dropping Unnecessary Columns
  3. Dropping Rows With Duplicates and Null Values
  4. Inspect and Fix Data Types

1. Reading CSV file

2. Drop Unnecessary columns

I will drop a list of columns I am not going to use in my analysis.

3. Dropping Rows With Duplicates and Null Values

4. Inspect and Fix Data Types

Some Data types needs to be converted

Since this data is only for Feburary I can hardcode the month but I will make it in a way as if you have the same data set for another month you can still use the same code

While inspecting I found that data can have better context if we added columns for working hours and weekdays and age

We can see that there is a max of 141 year old which seems unrealistic

The dataset is now clean and ready to be analysed.

Univariate Exploratory Data Analysis and Conclusions

Research Question 1: At what time of the day is the demand highest?

Conclusion.

Conclusion.

Research Question 2: What are the top 10 busiest starting stations?

Conclusion.

let's see at which hour is this station busiest

Conclusion.

Bivariate Exploratory Data Analysis and Conclusions

Research Question 3: How often is the service used by customers or subscribers?

Conclusion.

Conclusion.

Research Question 4: At which hour do people make longer trips?

Conclusion.

Multivariate Exploratory Data Analysis and Conclusions

Research Question 5: How do age and gender affect trip duration?

Conclusion.

Research Question 6: Is there any correlation between age, gender and user type?

Conclusion.